Europalsches Patentamt 
® Q))] European Patent Office ® PubUcatkm number: O 208 457 

Office europten des t>revets A2 



® EUROPEAN PATENT APPUCATION 

® ApplJcation number: 86304840.1 ® Int.a/: 6 06 F 15/16 

@ Dat0offillr>o: 24.06^ 



o 



Q. 



® Priorrty: 09.07.85 6B 8517376 

Dsteof publication of application: 
MJn^ BuIMn 87/3 

@ DesiQnatedContractino States: 
DC HI Nt 



® Applicant: NATK>mLRESEAm:HDEVELOPMEfyrr 
CORPORATION 
1 01 NetMrlnoton Cmisamy 
tomlonSE16BU(GB) 

@ Inventor: JeMhc»po,ChtUtopher Roger Dept. of 
Electitmlcs 

University of Southampton 
Southampton S09 MHCGB) 

@ Ropreeontative: Peers. DavW Ashley at el, 
REDCHE & OROSE lOTbeoheldsRoad 
London WC1X 8PMGB) 



@ Ajirooessor array. 

@ A processor array comprises e plurality of Identical pro- 
ceaslnf} elements 18, 28 etc. each capable of f:ombtnlr>g oper- 
ands to produce results and, efterarithmetic operations, carry 
data. 

Each element comprises means for aelecting the element 
from which it takes en input, which may be an operand or 
carry data. Thb enables the array to be configured in a ver 
satBe manner to form composite processing units. AnrK>ngst 
thow possible are a cyclic 1 0-bit shift reoister 22, a B-bft Hpple- 

earry adder 20, one bit processing units 28 for blt-sertel pro- 

cessirto, » two-bit ripple-cerry edder 26 and a two elemeivt unit 
24 wtiosa elements exchange data. 
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The present Invention relates to processor arrays comprising a 
plurality of proceesinR elements each capable of comblninR operands 
to produce results and carry data, control means operable to control 
the processing elements by supplying single Instructions to all of 
the elements Blimiltancously, and a connection network connecting 
each processing element to a plurality of other processlnjc elements 
for the transmission of data between elements. The processor array 
1b a computer design which makes use of a large array of processors, 
or "processing elements", each of which is capable of processing 
input dptn independently of the other processing elements In the 
array. The processing elements are all controlled by a central 
control unit. Commonly, the central control unit broadcasts the 
same instructions, known as global instructions, simultaneously to 
all of the processing elements, which execute the Instructions on 
respective data. The array is then said to be operating "in 
parallel". 

The processing elements are Interconnected by data paths which 
allow a result of an operation performed by one processing element 
to be passed to another element* Normally, only neighbouring 
elements are connected in this way. 

The most common architecture used for processor array computers 
consists of single-bit processing elements arranged to form a 2- 
dimensional array. For instance, they may be arranged to form a 
square lattice, with each element being connected to its nearest 
four or eight neighbour g. The data paths between elements are 
opened and closed by a switching network which enables data to be 
routed through the array along chains of processing elements. 

Some previous proposals have extended the possible applications 
of processor arrays by storing in each processing element a single 
bit of control Information, knoim as an activity bit. The setting 
of the activity bit determines whether the corresponding processing 
element is enabled or disabled to respond to global instructions. 
The activity bit can be set according to the result of a previous 
operation, and so allows conditional Instructions to be executed. 

A data word having more than one bit may be processed in en 
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array of elngle-blt processing elements by controlling a single 
element to perform an appropriate sequence of operations on the bits 
of the word. This is Known as bit-serial processing?. The reduction 
in processinp speed which is a consequence of serial processlnji le 
compensated for by the ability of the array to process a large 
amount of data at one time, each data word being processed by a 
T spectlve element of the array. However, it is naturally important 
to ensure that the processing resources available are used to best 
effect, but this is not always possible. The parallelism of the 
array, that is, the maximum number of data words which can be 
processed simultaneously, is determined when the array is built. In 
contrast, the parallelism required, that is, the number of data 
words simultaneously to be processed, depends on the problem being 
solved. The parallelism required may vary between problems and at 
different stages in the solution of a problem. Consequently, 
matching the parallelism required to the parallelism available, so 
as to use the resources with maximum efficiency, is a complex 
operation which may become excessively arduous. If the problem does 
not fit the array, some inefficiencies will inevitably occur, either 
because some processing elements lie idlt or because a large amount 
of processing power is used simply trying to adapt the problem to 
fit the array. 

Several references in the literature describe arrays whose 
parallelism may be changed during use. Flanders et al ("Efficient 
High Speed Computing with the Distributed Array Processor", High 
Speed Computer and Algorithm Organisation, Academic Press, London 
1977) describe an array in which a row of processing elements may be 
linked together to form a bit parallel processor in which all of the 
bits of an operand are processed simultaneously, by respective 
processing elements, and carry data "ripples" between the elements, 
along the row. Another proposal which is similar in this respect Ip 
described by Arvind et al ("A VLSI Chip for real-time image 
processing", Proc. IEEE International Symposium on Circuits and 
Systems, May 1983, pp A05-^08). 

The present inventor has previously described ("A 
Keconf 1 gurable Processor Array for VLSI', Proc, Workshop on 
Advances in the use of Vector and Array Processors, Bristol 1982) an 
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«rrey In which each processing element stores a bit of control 
Information In addition to an activity bit, and known as a 
reconfiguration control bit.. A processlnK element in a row of 
elements performing bit parallel processing reflects or accepts carry 
data from Its neighbour according to the Betting of the 
reconfiguration control bit. The rows of eletnente can therefore be 
split into processing units of arbitrary length. 

Kondo et al ("An LSI Adaptive Array Processor" » IEEE Journal of 
Solid-state Circuits. Vol SC-18, No. 2, April 19B3, p 147) describe 
an array In which each processing element may send a carry bit to 
its left or lower neighbouring element and may accept or reject a 
carry bit sent by its upper or right neighbouring element. This 
arrangement enables rows or columns performing bit-parallel 
operations to be sub-divided. A combination of both types of sub- 
division enables operations, called "block operations" by Kondo» to 
be performed. For Instance , several data words may be stored In a 
2"dimensional block of processing elements » each word being stored 
along a respective row of elements, the words being added by a 
combination of bit-parallel, ripple carry additions along the rows 
and down the columns of the block, to produce the sum of all of the 
words, stored along the bottom row of the block. 

The ob.lect of the present invention is to provide a processor 
array which is versatile enough to be used with a wide range of 
problems and which reduces the difficulty of matching the 
parallelism required and the parallelism available. 

The present invention is characterised in that each element can 
send operands or carry data, selectively, to any other element to 
which it 1b connected by the connection network, and in that each 
element comprises storage means for storing a direction selection 
code and direction selection means responsive to the contents of the 
storage means to select the element from which transmitted data is 
accepted or the element to which data is transmitted* 

•Operand* Is used here to mean either unprocessed data or 
result data stored after an operation, for further processing or 
transmission to another element. The term is used in contrast to 
the terms 'carry data' and 'control data*. 

The processing elements of a processor array according to the 
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Invention may be single-bit processing elements. The array can 
then be considered as a pool of single bit bit-slices which can be 
linked together under program control, by storing appropriate 
direction selection codes in the elements, to form an array of 
niii}.tl-bit processing units of almost arbitrary size end shape* That 
iB, the shape of a line of processing elements forming a processing 
unit may meander through the array with no constraints on its shape 
other than those applied by the layout of the connection network* 
Thus, the configuration of the array is very flexible, thereby 
greatly simplifying the problem of matching the array to a problem 
to be solved. 

Preferably, each processing element comprise* further storage 
means for storing a further direction selection code and further 
direction selection lueans responsive to the further direction 
selection code and wherein the direction selection means of each 
processing element select the element from which data is accepted 
and the element to which data is transmitted In dependence on the 
stored direction selection codes. The use of input and output 
selection codes together, provides a U0eful symmetry in the 
processing element. This can be used, for instance, to reverse the 
direction of data flow In a processing unit simply by interchanging 
the significance of the stored codes, so that the Input selection 
code becomes the output selection cod« end vice-versa. 

Each connection between processing elements preferably 
comprises two data paths, one for each <llrectlon of transfer. This 
prevents data collisions occurring, for Instance when two elements 
wish to exchange data. 

Preferably, the processing elements are interconnected by 
connections extending In directions which are so chosen that closed 
rings of intercommunicating elements may be formed by the operation 
of the direction selection means. For instance, the processing 
elements may form a two-dimensional array such as a square array, 
with each processing element being connected to the four elements 
which are its nearest neighbours on the lattice. The ability to 
form closed rings of processing elements increases the range of 
functions which a processing unit may perform to include certain 
important functions to be described below. 
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In a preferred embodiment^ each processing element comprlaee a 
conf Ifoiratlon store which Btorea an instruction from a set of 
configuration InBtructlone, and comprises means reGponslve to the 
stored conf ifruratlon instruction to modify the response of the 
processing element to instructions from the control means, and 
wherein respective conf ijmration instructions determine whether a 
processing element acts as the most significant or the least 
Significant portion of, or a portion of intermediate significance in 
a processing unit formed by a chain of processing elements each of 
which communicates with the elements to either side of it in the 
chain. Preferably one configuration Instruction causes the 
processing element in which it is stored to transmit received data 
without processing. 

The provision of direction selection means and configuration 
means within the elements enable local control of operations* For 
instance, when words of data are being processed in the array, the 
direction of propagation of data and boundary effects at the edges 
of processing units can be locally determined for arithmetic and 
shift operations* This local control can also be used for other 
purposes* It can be used to allow efficient manipulation of a wide 
range of data structures, such as strings, trees and arrays, not 
necessarily conformal within the array of processors* Local control 
of data flow with a structure and efficient management of the 
boundaries of the structure are then possible. 

One embodiment of the present Invention will now be described 
more fully, by way of example, with reference to the accompanying 
drawings, in which: 

Fig* 1 shows part of a known processor array; 

Fig. 2 shows an array according to the Invention, arranged to 
prevent data collisions ; 

Fig* 3 Is a simplified, schematic representation of a 
processing element for use in an array according to the Invention; 

Fig. A shows part of on array according to the invention, 
configured to form various types of processing unit; . 

Figs. 5 to 12 show further examples of types of processing unit 
which can be configured; and 

Fig. 13 shows, in more detail, the processing element of 
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Fig. 1 shows part of a processor array 10 similar to those 
described in the prior art discussed above. The array 10 comprises 
processing elements 12 laid out to form a square lattice. A 
connection network provides a connection between each element 12 and 
its four nearest neighbours. The connections are Indicated in 
F^g- 1 by double-headed arrows 14, used to indicate that data can be 
pa^^sed In either direction along the connections. The condition 
shown in Fig. 1, In which each processing element is connected to 
two other elements In opposite directions In each dimension of the 
array. Is the condition normally adopted in the design of processor 
arrays. In principle, elements could be connected to any number or 
a31 of the other elements, but the complexity of wiring required 
quickly increases with an Increase in the number of connections 
provided for each element* Each processing element also has a 
section of memory, from which operands can be read and Into which 
results can be stored. A switch, not fhown In the drawing, is 
associated with each connection, to open and close the connection, 
thereby controlling the data routes available at «ry given time. 

When the processor array is controlled so that all information 
flows in the same direction at any one time, which is the case in 
the proposals of Arvind et al and Flanders et al«, the connection 
network shown in Fig* 1 is efficient. In particular, there Is no 
possibllltv of collisions occurring between data being driven in 
opposite directions along the same connection* 

In an array according to the present invention the use of 
direction selection codes enables two processing elements to 
exchange data, for Instance by each selecting the other as their 
source of input data. Data exchange over the same connection would 
lead to data collision. 

Two possibilities exist for the resolution of the problem. In 
the first, two data paths are provided by each connection. Each 
element can then drive data out in all directions If the elements 
select the input direction, or can accept Input from any direction 
If the elements select output directions, without data colllBlons 
occurring. 

This alternative is shown in Fig. 2, in which arrowheads are 
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used to Indicate the direction of data flow over each connection. 
Each element drives data In all four available directions » and 
receives data from all four directions » althouj^h data from only one 
direction Is accepted as an Input. 

Alternatively, each processing element could select both Input 
and output directions* If only one data path Is provided by each 
connection, data collisions would still occur when two elements 
exchange data. 

The preferred embodiment of the present Invention combines the 
two approaches by selecting Input and output directions and 
providing two data paths In each connection. 

Fig. 3 shows, schematically, a processing element Al of the 
preferred embodiment. The processing element 41 Is shown In more 
detail In Fig. 13 and Is described in more detail later, with 
reference to that figure. The eleirtent 41 has four Input lines 42, 
arriving from respective neighbours. The direction of the 
neighbours with respect to the element shown is indicated by the 
compass points N» E, S, W shown next to the appropriate line. The 
element also has four output lines 50 going to respective 
neighbours. Thus, two data paths exist between each connected pair 
of elements* Each data path 42, 50 Is two bits wide. 

The element stores two direction selection codes. In stores 
43a, 43b. The code in store 43a controls direction selection means 
44 to select one of the input 'lines for connection to the circuitry 
within the element 41. The code stored in the store 43b controls 
the output direction selection means 48 to select the output line 50 
along which the output of the element is to be sent. 

An input from another element may be either two operand bits, 
during operations in which data Is being moved around the array, or 
a carry bit, during bit-parallel arithmetic operations. 

Operand data arriving over the selected Input line 42 provides 
two component bits which pass along the result buses P and S to the 
memory 45 of the element 41. Carry data Is routed to carry 
circuitry labelled "Carry" In Fig. 3. 

Two bits of operand data, normally corresponding bits of 
respective operands, may be read from the memory 45 along the 
operand buses T and N. Operand bits on the buses T and N may be 
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UBed in two wavB. They tnay serve as inputs to lopic clrcuUry 56 
which can combine them accordinf^ to lopriral functions to generate a 
result which is stored in the memory 45 over the result bus P. 
Secondary logic circuitry 5R can perform lopical functions on 
operands on T and N independently of the circuitry 56 and store its 
result over the bus S. The primary, secondary end carry circuits 
5f>,58» "Carry" can co-operate to perform arithmetic operations on 
operands on T and N, to produce a result stored over P. During 
arithmetic operations, the circuitry "CAPPY" takes into account any 
carry data from the earier operations* This may have been stored 
in the circuitry* or received from another element. The carry 
circuitry operates with the circuits 5^^.58 to produce carry data for 

later operations. 

Data on the buses T and N may Alternatively form the output of 
the element ^1, for transmission over the selected line 50 to one of 
the neiehbourlnp elements. 

Two other types of output are possible. Carry data produced 
durinp arithmetic operations can be pent to mny of the nelphbouring 
elements, and an input from a line 4r2 to the element ^.1 can he sent 
straight to the output circuitry, over the bus 52, without 
processing. It can be sent to th^ memory A5, simultaneously. A 
gating circuit 5A selects whether the output is operand data from 
the memory 5A , carry data from th^ carry circuitry, or Input data on 
the bus 52. 

Global Instructions are received bj a control circuit 6f» from a 
global controller 67 which suppllps instructions simultaneously to 
all of the elements in the array. The circuit 66 decodes the global 
instructions and a configuration code stored in a store 43c, to 
provide control signals for the various components of the element. 
The control signals are sent to the components over control lines 
which are not shown in Fig. 3 for reasons of clarity. The global 
controller 67 is not shown in the remaining figures, in the 
interests of clarity. 

Fig. A schematically shows part of a processor array 16 
according to the invention, and comprising one-blt processing 
elements 18 arranged on a souare lattice. In Fig. A, only data 
routes which have been selected to convey data are shown- The 
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type of data beinjt transmitted is indicated by the letters C (for 
"carry") and 0 (for "operandV) next to the correspondlnR arrow. 

The ability of the processlni; units to determine the direction 
of information flow enablps groups of processing elements to link 
themselves together in a wide variety of ways^ to form composite 
unlts» here called processing units* The process of linking 
elements together is here referred to as configuring the array. 

The portion of the array shown in Fig* 4 has been configured to 
provide a variety of types of processing unit. Each unit Is 
Indicated by a dotted line surrounding the processing elements Ifi 
fornlng the unit. 

One processing unit 20 is formed by a chain of nine processing 
elements, and bo forms a 9-blt processor* The unit 20 may be used 
(as shown) in 9-bit, bit parallel arithmetic operations involving a 
carry rippling along the unit 20. Alternatively, the unit 20 could 
be used as a non-recycling 9-blt shift register^ for instance, by 
passing operand data from element to element along the chain. 

A chain of processing elements forming a single processing unit 
may meander through the array following any course » by storing 
appropriate direction selection codes in the processing elements. 
The only limitation on the shape of the unit is imposed by the 
connection network. In the present case, the limitation is that 
adiacent processing elements in a processing unit must be nearest . 
neighbours In the array because only nearest neighbours are 
connected. 

Another processing unit 22 shown in Fig. A is formed by a 
closed loop of ten processing elements each sending operand data to 
a neighbour. The element 22a at one end of the unit 22 receives 
operand data from the element 22b at the other end of the unit 22. 
The unit 22 therefore acts as a 10-bit recycling shift register. 

Another processing unit 2A Is a 2-blt unit enclosed by the 
processing unit 22. The elements of the unit 24 are shown 
exchanging operand data. These are the only elements shown using 
both data paths in a single connection, although two paths are 
available in every connection. 

A 2-blt procesBlng unit 26 can perform bit-parallel arithmetic 
operations with a carry bit being sent from the upper element 26a to 
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the lower element 26b of the unit. 

Finally, two elesients 2B are confifjured as one-bit processing 
unitBj such as would be used for bit-serial operations. 

The use of both input and output direction selection codes 
provides a useful symmetry in the array* For instance, the 
processing unit 22 Is shown shifting data in a clockwise direction. 
The shifting direction can be reversed simply by interchanging the 
§i^niflcance of the stored direction selection codes, so that the 
input direction selection code becomes the output direction 
selection code and vice— versa. 

In practice, the storage for the direction selection codes 
might not be distributed throughout the array, but rather form a 
single block of memory. The memory would be notlonally divided 
into sections, each section uniquely associated with a single 
processing element. In this sense, 98ch section of memory could he 
considered as part of the corresponding processing element. Access 
to the memory would also be available to control circuits outside 
the array. The configuration of the erray could then be set and 
changed rapidly by these control circuits loading direction 
selection codes into the appropriate regions of memory. Sections 
of the memory would also be allocated to the processing elements to 
allow them to read operands for processing, and to store results. 

In the interests of maximising the integration of 
circuits and minimising the number and length of data and control 
lines. It will often be preferable to provide memory within each 
processing element, as shown In the drawings* 

However memory is arranged, inpyt to a processing element can be 
from a neighbouring element or from its associated memory, or from 
both. For Instance, during arithmetic operations in a multi-bit 
processing unit, operands would be read from memory and combined 
with a carry transmitted from a neighbouring clement (except at the 
least significant end of the unit) to produce a result for storage 
in the memory and a carry for transmission to another element. 

During a shifting operation, data would be read from memory, 
transmitted to a neighbouring element and stored in the memory of 
the recipient. 

Further examples of how the array may be configured are shown 
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In FigB* 5 and 6# In theee flRures^ a numeral Is shown In each 
processinf! element • These nuinerals represent input direction 
selection codes encoded according to Table 1 below. The codes may 
be stored In practice as two bits dj and d^ as shown In the table. 
The output direction selection codes ere not shown In the figures^ 
but are also two bit binary codes according to Table 1, and set to 
match (but not equal) the Input direction selection code6» so that a 
first processing element which selects a second processing element 
as a recipient Is in turn selected by the second processing element 
as a sender of data. 

Table 1 

Direction Field Value Plrectlon 



^1 ^o 



o 



o o North 



oil East 
1 o 2 West 

1 1 3 South 

Fig. 5 shows twelve processing elements configured as six pro- 
cessing units each comprising two processing elements. The 
processing elements of each unit exchange data^ and so the overall 
effect of this configuration is for the data stored in adjacent 
columns of the array to change places. 

Fig. 6 shows a single, closed loop processing unit comprising 
six processing elements. This could be used as a recycling shift 
register or for ripple carry processing in which the overflow carry 
resulting at the most significant element is available to the least 
significant element at the end of the operation. 

The processor array so far described allows great flexibility 
in the way the processing elements are connected together because 
the direction of data transfer is controlled at each processing 
element and any data stored or generated by an element can be passed 
to any other element to which it is connected. 

The preferred embodiment also stores a further instruction In 
each processing element , known as the configuration instruction. 
This instruction modifies the response of the associated processing 
element to global instructions, for instance to determine any 



- 12 - 



0208457 



Bpecial action which may be required at the edges of a processing 
unJt. The configuration Instruction determines whether the 
processing element acts as an Interior element of a processing unit 
havlnp another element to either side, or as the element at the most 
cr least significant edge of the processing unit. For instance, 
during ripple carry arithmetic, an interior element will receive and 
propopate carry data, the most significant bit will receive but not 
propoxate carry data, and the least sijrnifjcant bit will propogate 
but not accept carry data. 

An alternative which may be more convenient in some 
circumstances, is to control whether or not an element takes account 
of received carry data, rather than whether or not it sends and 
receives carry data. In that case, only the least significant bit 
(which does not take account of a received carry) must be 
identified. The nature of direction »electlon operations would 
then be uniform throughout the unit. 

The configuration codes can conveniently be encoded as two-bit 
binary words stored in the processing elements. This provides four 
distinct configuration codes. Three are used to configure elements 
to act as interior, least significant or most significant bits. It 
has been found advantageous to use the fourth to configure elements 
as data buses, in which the input is connected directly to the 
output so that data is passed without processing. The data can 
also be stored by each processing element through which it passes. 
This enables a single bit sent from the least significant processing 
element of a processing unit to be distributed to the other elements 
of the unit, as required during multiplication. 

The bus configuration can be used to close an otherwise 
unclosed loop, or to shorten the effective length of a loop 
processing unit. For instance, the ten bit processing unit 22 in 
Fig. A can be used as an eight bit unit by configuring two of its 
processing elements as buses. 

The configuration code is effective when global instructions 
command the use of the connection network between the processing 
elements, for instance for shifting operations and ripple carry 
addition. Fips. 7 to 12 show some of the types of closed loop 
processing elements which can be constructed by global instructions 
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whoBe effect 1b modified by the conflagration code. These figures 
are Bchematlc and only show the connections between processlnR 
elements. Each connection between processing elen«nt8 Is two bits 
wide. When data is passlnr between ele«ents. each element receives 
an input bit on each of the one-bit buses P and S, and supplies an 
output bit via each of the one-bit buses T end N. Carry data or 
operand data nay be passed btetween connected elements. 

A table is shown In each processing element to Indicate the 
contents of the stores storing the direction selection codes and the 
configuration code. The contents of the "left" and the "right" 
ctores indicate th. input direction to be selected for a left or 
right shift of data. The terms left shift and right shift refer to 
the change of significance of bits of data during « shift, by 
analogy with normal binary notation. A left shift moves data 
towards the most significant end of a processing unit. A right 
shift moves data towards the least significant end of the processing 
unit. The symmetry of the arrangement provides that the "right" 
store stores the output direction for a left shift and the "left" 
store stores the output direction for a right shift. The contents 
of the store labelled "config" indicate the configuration code 
configuring elements as least significant bit processors (LS). most 
significant bit processors (MS) or interior bit processors (bit). 

Fig. 7 Shows the connections made between processing elements 
upon receipt of a global instruction to perform a right shift of 
data non-cycllcally around the ring. That is. In elements 30,32 
with the configuration code "bit", the contents of "left" are used as 
the output direction selection code and the contents of "right" as 
the input direction selection code. 

Element 34 has the configuration code "I>S". It takes Its input 
from element 32. in view of the direction selection code stored in 
"right", but its output is discarded because the shifting is non- 
cyclic. That is, no direct communication is reouired between the 
least and most significant bits. 

Element 36 has the configuration code "MS" and provides an 
output to element 30 under the control of the direction selection 
code stored in "left", but takes as Its input a bit (0 or 1) set by 
the function generator circuits 56, 58 under the control of the 
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g}obal control circuit. 

The overall effect of the Instruct Ion, Interpreted by the 
configuration codes stored In the elements, Is for the composite 
processlnj? unit formed by the elements 30, 32, 34, 36 to shift data 
r n-cycllcally towards the least sipnif leant bit, 

Fi^. R shows the same processing element as Fig. 7, upon 
receipt of a global Instruction to perform a non-cyclic left shift. 
In this case, elements 30^ 32 and 36 take the contents of "left" as 
the Input direction selection code. The elements 30, 32 and 3A 
take the contents of "right" as the output direction selection code. 
The output of the most significant element 36 is discarded. The 
input of the least significant element JA Js set by global control. 

The overall effect of the instruction is to shift data non- 
cyclically towards the most significant end of the processing unit. 

Fig. 9 shows the same processing unit receiving a global 
Instruction to shift data to the left end non-cycHcally • The 
operation shown in Fig. 9 differs from that shown In Fig. 8 in one 
important respect. The output of the element 36 on the 'T' line is 
not discarded, but is applied to the input of the least 

significant element 34. The "N* output of element 36 is discarded. 
In all other connections shown in Fig. 9, *T' outputs are applied to 
'P* inputs and *N» outputs are applied to 'S* inputs. Within all 
four elements 30, 32, 3A, 36, data received on *P' and 'S* inputs is 
passed to 'T' and 'N' outputs respectively* In other 
circumstances it may be appropriate to send data received on 'P^ and 
'S' inputs to 'N* and 'T* outputs, respectively. 

The effect of these connections is that data set by the global 
control circuits at the 'P' input of the least significant element 
34 travels twice around the processing unit before being discarded 
on leaving the 'N* output of the most significant element 36. The 
first circuit around the unit follows the route from 'P' inputs, 
internally of elements to 'T' outputs, then over connections to the 
next *P' input and so on until leaving the 'T' output of the element 
36. The data then commences its second circuit, from *S' inputs, 
Internally to 'N* outputs, then over connections to the next 'S* 
Input and so on until being discarded on leaving the *N* output of 
the element 36. 
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The operation of the least slCTlflcaTit bit Is modified by the 
conf Jguration code in one respect* Each element 30, 32, 34, 36 
comprises a carry store 40. The least significant element takes 
Its carry Input from the carry store 40 and stores there a carry 
received from the most significant element 36* In the other 
eli^ripnts 30, 32, 36, generated carries are stored but not used. 
They need not be stored, but it is convenient to do so because they 
m^phL be required for later, different operations. 

Before the commencement of addition, the contents of the carry 
store 40 of the element 34 are set by the global control circuit. 
At the end of a single addition operation, the register AO of. the 
element 34 contains the carry from the element 36. The register 40 
can then be reset by the global control circuit if the unit is next 
to perform an unrelated addition operation. Alternatively, the 
connection between the least and most significant elements 34 and 36 
enables addition to be performed on words too long to be 
accommodated in the unit. In the following way. Initially, the 
unit performs an addition of the four least significant bits of the 
operands. This provides a result which is stored and one 
significant carry bit which Is stored in the carry register 40 of 
the element 34» A further addition operation can immediately be 
performed on the four operand bits of next highest significance. 
This operation can take account of the stored carry from the 
previous operation, so that the result produced, along with the 
result previously produced, taking due account of the significance 
of bits, represents the true sum of the operands. Operands of any 
length may be operated on by this method of dividing the operands 
Into four bit words, beginning at the least significant bit, and 
performing ripple-carry addition on the words sequentially, storing 
and taking acount of the carry produced by the previous addition of 
two words. 

Fig. 13 shows a single processing element 41 in more detail. 

Inputs from neighbouring elements arrive over two-bit input 
buses 42. Direction selection means 44 selects one of the input 
buses 42, under logical control to be described later, to be 
connected to the bus 46 for the transmission of data into the 
interior of the element 41. The input is divided into its 
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componentB and can pass over bxiBee P and S to wemory* 

Operands can be read from memory onto two buses N, where 
they can be supplied to output direction selection means A8. The 
output direction selection means AB select one of the output buses 
50 for onward transmission of the output to the corresponding 
neighbouring element* 

Two other types of output data are possible. A carry output 
may arrive over the line 51, or the received input may be sent 
directly to the output, over the bus 52. The bus 52 is used when 
the element is in the "hus" configuration described above* 

One of the three possible outputs is selected under the control 
of three logical signals labelled e^, 63 and h^^ which open and 
close respective gates 54* 

Operands on the buses labelled T and N are also supplied to a 
primary function generator 56, a secondary function generator 58 and 
a carry kill function generator 60* The function generators 56 and 
5R are controlled by A-bIt global control signals arriving over 
control buses FP and FS. Consequently, each may perform a total of 
16 different functions. 

The output of the primary and secondary functional units, when 
performing logical operations, is selected by the switch 59 onto the 
buses P and S respectively, for storage without further processing. 
If required, results can subsequently be read onto buses T and N and 
transmitted to another element. 

The primary and secondary circuits 56, 58 work together to 
perform arithmetic functions. The result Is applied to an 
exclusive OR gate 62 which combines it with a carry bit arriving 
over the line 64 to generate a final result. The final result is 
sent to storage over the bus P. 

The carry bit is provided either from a Carry register 66 or 
from a neighbouring element, according to the setting of the 
switch 68. 

The new carry is generated by a carry generator 70 under the 
control of the primary and secondary function generators 56, 58 and 
the carry kill function generator 60. The carry reirister may be 
loaded either from the carry generator 70 or by a neighbouring 
element, according to the setting of the switch 72. 
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Since two buBes P, S are available for wrltlnjj to the element's 
memory, hut arithmetic operations only ;zenerate a single result bit, 
it Is convenient to provide an Input (RAM input) from memory In the 
global control unit, so that the spare capacity of the S bus can be 
used to read in fresh data while a result is being stored over the 
bus P, 

Control of the operation of the element ^1 is provided in the 
following way. Tvelve-blts of control Inforroatlon arrive from the 
global control circuits Four of these bits arrive over the bus FP 
and another four over the bus FS. The remaining four, labelled I^, 
Ij, I2 and 1^, form a global instruction word which specifies the 
type of operation to be performed, for instance one of those 
described with relation to Figs, 7 to 12- The bits have the 
following significance: 

Table 2 



✓ 


Arithmetic and logical - 


0 


Shift 1 


^2 


logical ■= 0 


Arithmetic 


- 1 


Plane 


- 0 




cyclic 


- 1 


• 




global 
- 0 


local 
- 1 


carry 
save 
add « 0 


ripple 
carry 
add « 1 


single 
length 
«= 0 


double 
length 
- 1 


single 
length 
» 0 


1 

double* 
length" 

- 1 i 


i 


1 

0 ? ^ 

II t II 
^ I 

x: I j-> 
to I ^ 






0 
II 

to 

1-1 
u 


II 
1-1 


1 

0 
II 

to 

U 


II 

to 

f-H 


0 
II 


II 

«^ 

a> 
t-i 


■ 0 
11 

x: 
. ^ 

• I- 


•1 


0 
ti 

x> 
s: 

L. 


! 

:i 

01 

1 



Six further bits of control information ere stored in the 
associated memory and are individually set for each processing 
element. Two bits represent the address to the "left" element (as 
discussed above in relation to Figs. 7 to 12) and two more represent 
the address of the "right" element • These are stored in registers 
labelled "left" and "right" in Fig. 13. The contents of "left" and 
"right" are applied one to the input direction selection means 
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and one to the output <5irectlon ©election loeanB 48 depending on the 
state of the Bwltch 74. Thu8» the direction of data flow around a 
rinR procesBlnji unit can he reversed sinply by changing the state of 
the switch 74 In each element. The state of the switch 74 is set 

by 

The final two bite of control data CO and CI are stored In the 
configuration register labelled "configuration" in Fig. 13. CO and 
CI have the followinj? elj^nl f 1 cance : 

Table 3 
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bit - 1 


LS • 0 


MS - 1 



^o» ^3» combined by loj^ic circuits (not 

shown In the drawings) to generate the remaining control signals 
required in the element » according to the following equations: 
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The condition "In" represents the condition in which data 
enters one end of a processing unit (or is set by global control). 
This becomes significant for an element during ripple carry 
addition, for example. If the element is confi^red as the least 
significant element of a unit and the carry is to the left (in terms 
of data significance) or if the element is the most significant and 
the carry direction is to the right, and similarly during shift 
operations. The carry store 66 is loaded from the carry generator 
70 unless a ripple add is being performed and the "In" condition Is 
true. Otherwise during ripple carry, the received carry bypasses 
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the carry replster 66 through the switch 68, The carry out le 
passed directly to the output over the line 51. 

The global logical function, executed when ^3 " ^2 " ^1 " ^» 
chosen when the array Is constructed. It is a function built Into 
the hardware of the connections between the processing elements » and 
can operate on data passing through elements configured as buses* 
The function can be chosen so that the connections form a multi- 
Input logic gate. If the array is built In NMOS technology, the 
global function can conveniently be the NOR function. In bipolar 
technology, the OR or AND function might be used. 

Although the processing element described above uses direction 
selection means to select both the input and the output directions, 
the invention could be performed with only one direction selection 
means selecting either the Input or the output direction. In that 
case, output would be transmitted or input accepted respectively 
from all directions. 

The described connections between elements are each 2 bits 
wide, and each processing element comprises two function generators 
under individual control. These numbers could be varied. In 
particular, the array could be simplified by providing a single 
function generator in each element, one bit data paths between 
elements and one one-bit bus for loading the element's memory. 
Naturally, the processing power of the elements would be weaker, and 
software would become more complex. 

The above description and the drawings refer to and show 
elements on a square lattice. For practical reasons, it may not be 
desirable or possible to build an array with processing elements 
laid out geometrically. The term 'square lattice' and other 
geometrical terms used are Intended to refer equally to an array 
whose layout is topologically equivalent to the one shown and 
described, as they refer to the layout shown and described. 
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CLAIMS 

!• A processor array (16) conprlslnR a plurality of proceeslnR 
elements (12»18»41), each capable of combining operands to produce 
results and carry data, control means (67) operable to control the 
processing elements by supplying? single instructions to all of the 
elements simultaneously, and a connection network (14,42,50) 
connecting each processing element to a plurality of other 
processing elements for the trans-mission of data between elements, 
characterised in that each element can send operands or carry data, 
selectively, to any other element to which it is connected by the 
connection network, and in that each element comprises storage means 
(43St left) for storing b direction selection code and direction 
selection means (44) responsive to the contents of the storage means 
to select the element from which transmitted data is accepted or the 
element to which data is transmitted. 

2. A processor according to claim 1, characterised In that the 
direction selection means (44) selects solely the element (12,18,41) 
from which transmitted data is accepted. 

3. A processor array according to claim 1, characterised In that 
the direction selection means (44) selects solely the element: 
(12,18,41) to which data is transmitted. 

4. A processor array according to claim 1, characterised in that 
each processing element (12,18,41) comprises further storage means 
(43b, right) for storing a further direction selection code, and 
further direction selection means (48) responsive to the further 
direction selection code, and in that the direction selection means 
(44,48) of each processing element select the element from which 
data is accepted and the element to which data is transmitted in 
dependence on the stored direction selection codes. 

5. A processor array according to claim 4, characterised in that 
the direction selection means (44) and the further direction 
selection means (48) respond to the contents of a respective one of 
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the Blorage means and the further Btora^e loeans eelected In reBpon&e 
to a command from the control means » whereby the slgnif Icancee of 
tbe direction selection code and the further direction selection 
code may be Interchanged. 

6. A processor array according to any preceding claim, 

c Varacterlsed in that each connection (1 A, 42^50) between processing 
piements (12,18,^1) comprises two data path8> whereby connected 
elements may exchange data in a single operation. 

7. A processor array according to any preceding claim, 
characterised in that the processing elements (12,38,41) are inter- 
connected by connections (14,42,50) extending in directions so 
chosen that closed rings (22) of Intercommunicating elements may be 
formed by the operation of the direction selection means (44,48). 

8. A processor array according to any preceding claim, 
characterised in that the processing elements (12,18,41) form a two- 
dimensional array. 

9. A processor array according to claim 8, characterised in that 
the processing elements (12,18,41) occupy positions which form a 
square lattice. 

10. A processor 'array according to claim 9, characterised in that 
each processing element (12,18,41) is connected to the four elements 
which are its nearest neighbours on the lattice. 

IK A processor array according to any preceding claim, 
characterised In that each data path (14,42,50) between processing 
elements is more than one bit wide. 

12. A processor array according to any preceding claim, 
characterised In that each processing element (12,18,41) comprises 
memory means (45), a plurality of function generators (56,58) 
operable to process operands read from the memory means to generate 
respective results, and result buses (P,S) connecting respective 



- 23 - 



0208457 



function penerators to the memory means for the storage of results. 

13- A procesGor array according to claim 12, characterised in that 
each element (12,18,41) comprises a connection (RAM INPUT) between 
at least one result bus (S) and the control means (67), whereby the 
control means may write new data into the memory means (45) 
simultaneously as result data is written into the memory means over 
the other result buses (?)♦ 

14, A processor array according to any preceding claim, 
characterised in that each processing element (12,18,41) comprises a 
configuration store (CONFIGURATION) which stores an instruction from 
a set of configuration instructions, and coin-prises means responsive 
to the stored configuration instruction to modify the response of 
the processing element to instructions from the control means, and 
In that respective configuration instructions determine whether a 
processing element acts as the most significant or the least 
significant portion of, or a portion of intermediate significance in 
a processing unit (20,22,26) formed by a chain of processing 
elements each of which communicates with the elements to either side 
of it in the chain. 

15. * A processor array according to claim 14, characterised in that 
a further configuration instruction causes processing elements 
(12,18»41) to transmit received data without processing. 

16* A processor array according to claim 14 or 15, characterised in 
that the configuration Instructions are encoded as two-bit binary 
words* 

17, A processor array according to any preceding claim, 
characterised in that the or each direction selection code is a two- 
bit binary word. 

18» A processor array according to any preceding claim, 
characterised in that each processing element (12,16,41) is a 
slngle-blt processor. 




0208457 



FigJ 



12 





1 ' 








\ 1 

{ ^ 






J. 

-<i 







(h 


^ 




V 

-A— 






J. 

V 

<h 

t 1 







H> 

V 


^ 




Jl 

i 






t i 


^ 




t t 





/2 



Fin 9 



5s? 



0208457 



i. ii n il 



!_±2 



DC z: 



O DC 

to 



JO- 



DC 

oc 

«_> 



3-. 



CD >— 

o 

—I o 
CD O 



* * ^ ^ 

^ 1 1 1 



00 



0208457 




V// 



0208457 




N 




Fig, 6 



W- 



0208457 



LEFT 



RIGHT 



CONFIG 



36' 



LEFT 



RIGHT 



CONFIG 



LS 



© ® 



MS 



LEFT 



RIGHT 



CONFIG 



■o 



LEFT 



RIGHT 



CONFIG 



W 



BIT 



(r>-i (5H 



5) 



©-1 ©J 



BIT 



-30 



Fig. 6 



N 

A 



S 



V// 



0208457 




0208457 



LEFT 

RIGHT 

CONFIG 



34- 



LEFT 



RIGHT 



CONFIG 



LS 




MS 



LEFT 




RIGHT 


s 


CONFIG 


BIT 


1®1 




-© 








-© 




-1 ©- 




LEFT 


N 


RIGHT 


W 


CONFIG 


BIT 



'30 



N 
i 



lO/lt 



0208457 




ii/u 



V, 0208457 



to 



cr> I 



]5[ 



X 



O 
»— 



1 f • 




£2 



8: 



r^A \ \ \ 

^ Z UlJ lO ^ 



z ixJ in ^ 



